Functions for quality control. 'snpQC' may be used to count/remove neighbor repeated SNPs, markers with MAF lower than a given threshold, and imputations. 'cleanREP' identifies and merge duplicate genotypes. The 'reference' function changes the reference genotype. For NAM populations, this function must be used when genotypes are coded according to the reference genome instead of the standard parent.
snpQC(gen,psy=1,MAF=0.05,misThr=0.8,remove=TRUE,impute=FALSE)
cleanREP(y,gen,fam=NULL,thr=0.95)
reference(gen,ref=NULL)
Numeric matrix containing the genotypic data. A matrix with \(n\)
rows of observations and (\(m\)) columns of molecular markers. SNPs must be coded as 0, 1, 2
, for founder homozigous, heterozigous and reference homozigous. NA
is allowed.
Tolerance parameter for markers in Perfect SYymmetry (psy). This QC remove identical markers (aka. full LD) that carry the same information. Default is 1, which removes only SNPs 100% equal to its following neighbor.
Minor Allele Frequency. Default is 0.05. Useful to inform or remove markers below the MAF threshold. Markers with standard deviation below the MAF threshold will be also removed.
Missing value threshold. Default is 0.8, removing markers with more than 80 percent missing values.
Logical. Remove SNPs due to PSY or MAF.
If TRUE, impute missing values using the expected value.
Numeric vector (\(n\)) or numeric matrix (\(n\) x \(t\)) of observations describing the trait to be analyzed. NA
is allowed.
Numeric vector of length (\(n\)) indicating which subpopulations (\(i.e.\) family) each observation comes from. Default assumes that all observations are from the same populations.
Threshold above which genotypes are considered identical. Default is 0.95, merging genotypes >95 percent identical.
Numeric vector of length \(n\) with elements coded as 0, 1, 2
, it represents the genotypic information of a new reference genotype. Default assumes that more frequent allele represents the reference genome.
snpQC - Returns the genomic matrix without missing values, redundancy or low MAF markers.
cleanREP - List containing the inputs without replicates. Groups of replicates are replaced by a single observation with the phenotypic expected value. The algorithm keeps the genotypic information of the first individual (genotypic matrix order).
reference - Returns a recoded \(gen\) matrix
# NOT RUN {
# }
# NOT RUN {
data(tpod)
gen=reference(gen)
gen=snpQC(gen=gen,psy=1,MAF=0.05,remove=TRUE,impute=FALSE)
test=cleanREP(y,gen)
# }
Run the code above in your browser using DataLab